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Abstract. A number of representation schemes have been presented for use within 
Learning Classifier Systems, ranging from binary encodings to neural networks. 
This paper presents results from an investigation into using discrete and fuzzy 
dynamical system representations within the XCSF Learning Classifier System. 
In particular, asynchronous Random Boolean Networks are used to represent the 
traditional condition-action production system rules in the discrete case and asyn- 
chronous Fuzzy Logic Networks in the continuous-valued case. It is shown pos- 
sible to use self-adaptive, open-ended evolution to design an ensemble of such 
dynamical systems within XCSF to solve a number of well-known test problems. 



1 Introduction 



Traditionally, Learning Classifier Systems (LCS) [35] use a ternary encoding to gener- 
alize over the environmental inputs and to associate appropriate actions. A number of 
representations have previously been presented beyond this scheme however, including 
real numbers [104], fuzzy logic [98] and artificial neural networks [11]. Temporally dy- 
namic representation schemes within LCS represent a potentially important approach 
since temporal behaviour of such kinds is viewed as a significant aspect of artificial life, 
biological systems, and cognition in general [4]. 

In this paper we explore examples of a dynamical system representation within 
the XCSF Learning Classifier System [105] — termed "Dynamical Genetic Program- 
ming" (DGP) [12]. Traditional tree-based Genetic Programming (GP) [46] has been 
used within LCS both to calculate the action [1] and to represent the condition (e.g., 
[55]). DGP uses a graph-based representation, each node of which is constantly up- 
dated with asynchronous parallelism, and evolved using an open-ended, self-adaptive 
scheme. In the discrete case, each node is a Boolean function and therefore the repre- 
sentation is a form of Random Boolean Network (RBN) (e.g., [41]). In the continuous 
case, each node performs a fuzzy logical function and the representation is a form of 
Fuzzy Logic Network (FLN) (e.g., [45]). We show that XCSF is able to solve a num- 
ber of well-known immediate and delayed reward tasks using this temporally dynamic 
knowledge representation scheme with competitive performance with other representa- 
tions. Moreover, we exploit the memory inherent to RBN for the discrete case. 



2 Related Work 



2.1 Genetic Programming in Learning Classifier Systems 

A significant benefit of symbolic representations is the expressive power to represent 
relationships between the sensory inputs [62]. LISP S-expressions comprised from a set 
of Boolean functions (i.e., AND, OR, and NOT) have been used to represent symbolic 
classifier conditions in LCS to solve Boolean Multiplexer and Woods problems [55], 
and to extract useful knowledge in a data mining assay [51]. An analysis of the popula- 
tions [56] has subsequently shown an increasing prevalence of sub-expressions through 
the course of evolution as the system constructs the required building blocks to find 
solutions. However, when logical disjunctions are involved, optimality is unattainable 
because the symbolic conditions highly overlap, resulting in classifiers sharing their fit- 
ness with other classifiers and thereby lowering the fitness values [53]. This was later 
extended to also include arithmetic functions (i.e., PLUS, MINUS, MULTIPLY, DI- 
VIDE, and POWEROF) and domain specific functions (i.e., VALUEAT and ADDROF) 
to solve a number of Multiplexer tasks [38]. 

In addition, Lanzi et al. [52] based classifier conditions on Stack-Based Genetic Pro- 
gramming [73] and solved the 6 and 11 bit Multiplexer as well as Woods 1 problems. 
Here the conditions are linear sequences of tokens, expressed in Reverse Polish Nota- 
tion, where each token represents either a variable, constant or function. The function 
set used comprised Boolean operators (i.e., AND, OR, NOT and EOR) and arithmetic 
operators (i.e., +, -, >, =). 

Ahulwalia and Bull [1] presented a simple form of LCS which used numerical S- 
expressions for feature extraction in classification tasks. Here each rule's condition was 
a binary string indicating whether or not a rule matched for a given feature and the ac- 
tions were S-expressions which performed a function on the input feature value. More 
recently, Wilson [109] has explored the use of a form of Gene Expression Program- 
ming (GEP) [24] within LCS. Here the expressions are comprised from arithmetic func- 
tions and applied to regression tasks. The conditions are represented as expression trees 
which are evaluated by assigning the environmental inputs to the tree's terminals, evalu- 
ating the tree, and then comparing the result with a predetermined threshold. Whenever 
the threshold value is exceeded, the rule becomes eligible for use as the output. 

Landau et al. [47] used a purely evolution-based form of LCS (Pittsburgh style [87]) 
in which the rules are represented as directed graphs where the genotypes are tokens of 
a stack-based language, whose execution builds the labeled graph. Bit-strings are used 
to represent the language tokens and applied to non-Markov problems. The genotype 
is translated into a sequence of tokens and then interpreted similarly to a program in 
a stack-based language with instructions to create the graph's nodes, connections and 
labels. Subsequently, the unused conditions and actions in the stack are added to the 
structure which is then popped from the stack. Tokens are used to specify the matching 
conditions and executable actions as well as instructions to construct the graph, and to 
manipulate the stack. The bit-strings were later replaced with integer tokens and again 
applied to non-Markov problems [48]. 



2.2 Graph-based Genetic Programming 

Most relevant to the form of GP used herein is the relatively small amount of prior work 
on graph-based representations. Neural Programming (NP) [91] uses a directed graph 
of connected nodes, each performing an arbitrary function. Potentially selectable func- 
tions include READ, WRITE, and IF-THEN-ELSE, along with standard arithmetic and 
zero-arity functions. Additionally, complex user defined functions may be used. Signifi- 
cantly, recursive connections are permitted and each node is executed with synchronous 
parallelism for some number of cycles before an output node's value is taken. 

Poli (e.g., [77]) presented a similar scheme wherein the graph is placed over a 
two-dimensional grid and executes its nodes synchronously in parallel. Connections 
are directed upwards and are only permitted between nodes situated on adjacent rows; 
however by including identity functions, connections between non-adjacent layers are 
possible and thus any parallel distributed program may be represented. 

Teller and Veloso [92] presented Parallel Algorithm Discovery and Orchestration 
(PADO) which uses an arbitrary directed graph of nodes and an indexed memory. Each 
node in the graph consists of an action and a branch-decision component, with multi- 
ple outgoing branches permitting the various potential flows of control. A stack is used 
from where each program's inputs are drawn and the results pushed. The potentially 
selectable actions are similar to NP and include arithmetic operators, negation, min- 
imum and maximum, and the ability to read from and write to the indexed memory, 
along with non-deterministic and deterministic branching instructions. The graphs are 
executed chronologically for a fixed amount of time with each node selecting the next 
to take control. The output nodes are then averaged giving additional weighting to the 
more recent states. 

Other examples of graph-based GP typically contain sequentially updating nodes, 
e.g., Finite State Machines (e.g., [26]), Cartesian GP [64], Genetic Network Program- 
ming [34], Linear-Graph GP [39], and Graph Structured Program Evolution [84]. Schmidt 
and Lipson [82] have recently demonstrated a number of benefits from graph encodings 
over traditional trees, such as reduced bloat and increased computational efficiency. 

We have recently introduced the use of the graph-based Random Boolean Networks 
within LCS [16,76]. In this paper we extend that work to the most recent form of LCS, 
Wilson's XCSF, and to the continuous-valued domain with fuzzy logical functions. 

2.3 Evolving Discrete Dynamical Systems 

The most common form of discrete dynamical system is the Cellular Automaton (CA) 
[99] which consists of an array of cells (lattice of nodes) where the cells exist in states 
from a finite set and update their states with synchronous parallelism in discrete time. 
Traditionally, each cell calculates its next state depending upon its current state and 
the states of its closest neighbours. That is, CAs may be seen as a graph with a (typ- 
ically) restricted topology. Packard [71] was the first to use evolutionary computing 
techniques to design CAs such that they exhibit a given emergent global behaviour. 
Following Packard [71], Mitchell et al. [65] have investigated the use of a GA to learn 
the rules of uniform binary CAs. As in Packard's work, the GA produces the entries in 
the update table used by each cell, candidate solutions being evaluated with regard to 



their degree of success for the given task. Andre et al. [2] used traditional GP to evolve 
the update rules and reported similar results to Mitchell et al. [65]. Sipper [85] pre- 
sented a non-uniform, or heterogeneous, approach to evolving CAs. Each cell of a one- 
or two-dimensional CA is also viewed as a GA population member, mating only with 
its lattice neighbours and receiving an individual fitness. He shows an increase in per- 
formance over Mitchell et al. [65] by exploiting the potential for spatial heterogeneity 
in the tasks. In this paper, a more general form of dynamical system is exploited. 

3 Random Boolean Networks 

The discrete dynamical systems known as Random Boolean Networks (RBN) were 
originally introduced by Kauffman (see [41]) to explore aspects of biological genetic 
regulatory networks. Since then they have been used as a tool in a wide range of areas, 
such as self-organisation (e.g., [41]) and computation (e.g., [63]) and robotics (e.g., 
[78]). 

An RBN typically consists of a network of N nodes, each performing a Boolean 
function with K inputs from other nodes in the network, all updating synchronously 
(see Figure 1). As such, RBN may be viewed as a generalization of binary Cellular 
Automata (CA) [99] and unorganized machines [96]. Since they have a finite number 
of possible states and they are deterministic, the dynamics of RBN eventually fall into 
a basin of attraction. It is well-established that the value of K affects the emergent 
behaviour of RBN wherein attractors typically contain an increasing number of states 
with increasing K. Three phases of behaviour are suggested: ordered when K= 1, with 
attractors consisting of one or a few states; chaotic when K > 3, with a very large 
number of states per attractor; and, a critical regime around K = 2, where similar states 
lie on trajectories that tend to neither diverge nor converge and 5-15% of nodes change 
state per attractor cycle (see [41] for discussions of this critical regime, e.g., with respect 
to perturbations). Analytical methods have been presented by which to determine the 
typical time taken to reach a basin of attraction and the number of states within such 
basins for a given degree of connectivity K (see [41]). 

Closely akin to the work described here, Kauffman [41] describes the use of sim- 
ulated evolution to design RBN which must play a (mis)matching game wherein mu- 
tation is used to change connectivity, the Boolean functions, K and N. He reports the 
typical emergence of high fitness solutions with K=2 to 3, together with an increase 
in N over the initialised size. Sipper and Ruppin [86] extended Sipper's heterogeneous 
CA approach [85] to enable heterogeneity in the node connectivity, along with the node 
function; they evolved a form of Random Boolean Network. Van den Broeck and Kawai 
[10] explored the use of a simulated annealing-type approach to design feedforward 
RBN for the four-bit parity problem and Lemke et al. [59] evolved RBN of fixed N and 
K to match an arbitrary attractor. 

Figure 2 shows the affect of K on a 13 node RBN; results are an average of one 
hundred runs for each value of K. It can be seen that the higher the value of K, the 
greater the number of states the networks will cycle through, as shown by the higher rate 
of change of node states. Further, that after an initial rapid decline in the rate of change, 
this value stabilises as the states fall into their respective attractors. In the synchronous 
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Fig. 1: Example Random Boolean Network and node encoding. 



case (Figure 2a) when K = 2, the number of nodes changing state converges to around 
20%, and when K = 3 to just above 35%; thus we can see that the ordered regime occurs 
when approximately 20% or less nodes are changing state each cycle, and the chaotic 
regime occurring for larger rates of change. 

As noted above, traditional RBN consist of N nodes updating synchronously in dis- 
crete time steps, but asynchronous versions have also been presented, after [32], leading 
to a classification of the space of possible forms of RBN [28]. Asynchronous forms of 
CA have also been explored (e.g., [37]) wherein it is often suggested that asynchrony 
is a more realistic underlying assumption for many natural and artificial systems since 
"discrete time, synchronously updating networks are certainly not biologically defen- 
sible: in development the interactions between regulatory elements do not occur in a 
lock-step fashion" [110]. 

Asynchronous logic devices are known to have the potential to consume less power 
and dissipate less heat [103], which may be exploitable during efforts towards hardware 
implementations of such systems. Asynchronous logic is also known to have the poten- 
tial for improved fault tolerance, particularly through delay insensitive schemes (e.g., 
[20]). This may also prove beneficial for hardware implementations. 

Harvey and Bossomaier [32] showed that asynchronous RBN exhibit either point 
attractors, as seen in asynchronous CAs, or "loose" attractors where "the network passes 
indefinitely through a subset of its possible states" (as opposed to distinct cycles in the 
synchronous case). Thus the use of asynchrony represents another feature of RBN with 
the potential to significantly alter their underlying dynamics thereby offering another 
mechanism by which to aid the simulated evolutionary design process for a given task. 
Di Paulo [21] showed it is possible to evolve asynchronous RBN which exhibit rhythmic 
behaviour at equilibrium. Asynchronous CAs have also been evolved (e.g., [86]). 



Figure 2b shows the percentage of nodes changing state on each cycle for vari- 
ous values of K on a 13 node asynchronous RBN. It can be seen that, similar to the 
synchronous case (see Figure 2a), the higher the value of K, the greater the number 
of states the networks will cycle through in an attractor. These values are significantly 
lower than in the synchronous case however. For example, when K = 2, approximately 
20% of nodes change each synchronously updated cycle compared with 5% when up- 
dated asynchronously. The difference is to be expected because, in the asynchronous 
case, "the lack of synchronicity increases the complexity of the RBN, enhancing the 
number of possible states and interactions. And this complexity changes the attractor 
basins, transforming and enlarging them. This reduces the number of attractors and 
states in attractors" [28]. As previously mentioned, in the asynchronous case there are 
no cycle attractors, only point and loose attractors. 
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Fig. 2: The affect of K on a 13 node Random Boolean Network. 



4 XCSF Overview 

An LCS rule (also termed a classifier) traditionally takes the form of an environment 
string consisting of the ternary alphabet [0,1,#], a binary action string, and subsequent 
information including the classifier's expected payoff (reward) P, the error rate £ (in 
units of payoff predicted), and the fitness /. The # symbol in the environment condition 
provides a mechanism to generalise the inputs received by matching for both logical 
and 1 for that bit. 

For each phase in the learning cycle, a match set [M] is generated from the popula- 
tion set [P], comprising all of the classifiers whose environment condition matches the 
current environmental input. In the event that the number of actions present in [M] is 
less than a threshold value, 9 mna , covering is used to produce a classifier that matches 
the current environment state along with an action assigned randomly from those not 
present in [M]; typically 6 mna is set to the maximum number of possible actions so that 
there must be at least one classifier representing each action present. 



Subsequently, a system prediction is made for each action in [M], based upon the 
fitness-weighted average of all of the predictions of the classifiers proposing the action. 
If there are no classifiers in [M] advocating one of the potential system actions, cover- 
ing is invoked to generate classifiers that both match the current environment state and 
advocate the relevant action. An action is then selected using the system predictions, 
typically by alternating exploring (by either roulette wheel or random selection) and 
exploiting (the best action). In multi-step problems a biased selection strategy is often 
employed wherein exploration is conducted at probability p exp i r otherwise exploitation 
occurs [50]. An action set [A] is then built comprising all the classifiers in [M] advo- 
cating the selected action. Next, the action is executed in the environment and feedback 
is received in the form of a payoff, P. 

In a single-step problem, [A] is updated using the current reward. The GA is then 
run in [A] if the average time since the last GA invocation is greater than the threshold 
value, Qqa. When the GA is run, two parent classifiers are chosen (typically by roulette 
wheel selection) based on fitness. Offspring are then produced from the parents, usually 
by use of recombination and mutation. Typically, the offspring then have their payoff, 
error, and fitness set to the average of their parents'. If subsumption is enabled and the 
offspring are subsumed by either parent, it is not included in [P]; instead the parents' 
numerosity is incremented. In a multi-step problem, the previous action set [A]_i is 
updated using a Q-learning [101] type algorithm and the GA may be run as described 
above on [A]_i as opposed to [A] for single-step problems. The sequence then loops 
until it is terminated after a predetermined number of problem instances. 

In XCSF each classifier also maintains a vector of a series of weights, where there 
are as many weights as there are inputs from the environment, plus one extra, xq. That 
is, each classifier maintains a prediction (cl.p) which is calculated as a product of the 
environmental input (s t ) and the classifier weight vector (w): 



Each of the input weights is initially set to zero, and subsequently adapted to accu- 
rately reflect the prediction using a modified delta rule [66]. The Delta rule was modified 
such that the correction for each step is proportional to the difference between the cur- 
rent and correct prediction, and controlled by a correction rate, 77. The modified delta 
rule for the reinforcement update is thus: 



Where 77 is the correction rate and | s t 2 is the norm of the input vector s,. The values 
Awi are used to update the weights of the classifier cl with: 



cl.p{s t ) = cl.wo x xq + ^cl.Wj x s t (i) 

;>o 



(1) 




{P-cl.p{s t ))s,(i) 



(2) 



cl.Wi cl.Wi + Awi 



(3) 



Subsequently, the prediction error e is updated with: 



cl.e <r- cl.e + P(\P~cl.p(s t )\ —cl.e) 



(4) 



This enables a more accurate, piecewise-linear, approximation of the payoff (or 
function), as opposed to a piecewise-constant approximation, and can also be applied 
to binary problems such as the Boolean multiplexer and maze environments, resulting 
in faster convergence to optimality as well as a more compact rule-base [60]. See [106] 
for further details. 

5 Discrete DGP-XCSF 

To use asynchronous RBN as the rules within XCSF (see example rule in Figure 3), the 
following scheme is adopted. Each of an initial randomly created rule's nodes has K 
randomly assigned connections, here 1 < K < 5. There are initially as many nodes N as 
input fields / for the given task and its outputs O, plus one other, as will be described, 
i.e., N = I + + 1, The first connection of each input node is set to the corresponding 
locus of the input message. The other connections are assigned at random within the 
RBN as usual. In this way, the current input state is always considered along with the 
current state of the RBN itself per network update cycle by such nodes. Nodes are 
initialised randomly each time the network is run to determine [M], etc. The population 
is initially empty and covering is applied to generate rules as in the standard XCSF 
approach. 

Matching consists of executing each rule for T cycles based on the current input. 
The value of T is chosen to be a value typically within the basin of attraction of the 
RBN. Asynchrony is here implemented as a randomly chosen node being updated on a 
given cycle, with as many updates per overall network update cycle as there are nodes 
in the network before an equivalent cycle to one in the synchronous case is said to have 
occurred. See [28] for alternative schemes. 

In this study, where well-known maze problems are explored there are eight possible 
actions and accordingly three required output nodes. An extra "matching" node is also 
required to enable RBNs to (potentially) only match specific sets of inputs. If a given 
RBN has a logical '0' on the match node, regardless of its output node's state, the rule 
does not join [M]. This scheme has also been exploited within neural LCS [11]. A 
'windowed approach' is utilised where the output is decided by the most common state 
over the last W steps up to T. For example, if the last few states on a node updating 
prior to cycle T is 0101001 and W = 3, then the ending nodes state would be '0' and 
not '1'. 

When covering is necessitated, a randomly constructed RBN is created and then 
executed for T cycles to determine the status of the match and output nodes. This pro- 
cedure is repeated until an RBN is created that matches the environment state. 

Self-adaptive mutation affecting a variable length representation was first explored 
by Fogel et al. [25] where a self-adaptive value was used to control the deletion rate 
of states within finite state machines. Furthermore, Ghozeil and Fogel [30] used self- 
adaptive mutation to control the rate of addition and deletion of hyperboxes to cluster 
spatial data. Self-adaptive mutation was first applied within LCS by Bull et al. [13] 
where each rule maintains its own mutation rate fi. Self-adaptive mutation affecting 
rule size was first used in LCS with a neural representation [14]. This is similar to the 
approach used in Evolution Strategies (ES) [83] where the mutation rate is a locally 
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Fig. 3: An evolved dDGP-XCS 6-bit MUX asynchronous rule. 



evolving entity in itself, i.e., it adapts during the search process. Self-adaptive mutation 
not only reduces the number of hand-tunable parameters of the evolutionary algorithm, 
it has also been shown to improve performance. 

Following [14], mutation only is used here. A node's truth table is represented by 
a binary string and its connectivity by a list of K integers in the range [l,N]. Since 
each node has a given fixed K value, each node maintains a binary string of length 2 K 
which forms the entries in the look-up table for each of the possible 2 K input states of 
that node, i.e., as in the aforementioned work [71] on evolving CAs, for example. These 
strings are subjected to mutation on reproduction at the self-adapting rate for that rule. 
Hence, within the RBN representation, evolution can define different Boolean functions 
for each node within a given network rule, along with its connectivity map. Specifically, 
each rule has its own mutation rate stored as a real number and initially seeded uniform 
randomly in the range [0.0, 1 .0]. This parameter is passed to its offspring. The offspring 
then applies its mutation rate to itself using a Gaussian distribution, i.e., fx' = /ie^ 0,1 ', 
before mutating the rest of the rule at the resulting rate. Due to the need for a possible 
different number of nodes within the rules for a given task, the DGP scheme is also of 
variable length. Once the truth table and connections have been mutated, a new ran- 
domly connected node is either added or the last added node is removed with the same 
probability jj.. The latter case only occurs if the network currently consists of more than 
the initial number of nodes. In addition, each rule maintains its own T value which is 
initially seeded randomly between 1 and 50. Thereafter, offspring potentially increment 
or decrement T by 1 at probability jj,. W is evolved in a similar fashion, however it is 
initially seeded between and T, and cannot be greater than T. Thus DGP is temporally 
dynamic both in the search process and the representation scheme. 

Whenever an offspring classifier is created and no changes occur to its RBN when 
undergoing mutation, the parent's numerosity is increased and mutation rate set to that 
of the offspring. 

6 Discrete DGP-XCSF Experimentation 

The simplest form of short-term memory is a fixed-length buffer containing the n most 
recent inputs; a common extension is to then apply a kernel function to the buffer to en- 
able non-uniform sampling of the past values, e.g. an exponential decay of older inputs 
[68]. However it is not clear that biological systems make use of such shift registers. 
Registers require some interface with the environment which buffers the input so that it 
can be presented simultaneously. They impose a rigid limit on the duration of patterns, 
defining the longest possible pattern and requiring that all input vectors be of the same 
length. Furthermore, such approaches struggle to distinguish relative temporal position 
from absolute temporal position [23]. 

Whereas many GP systems are expression based, some have also utilised a form 
of memory or state. For example, Linear GP [6]; indexed memory, e.g., [90], [9], and 
[3]; and work on evolving data structures which maintain internal state, e.g., [49]. In 
addition, some systems have used (instead of evolved) data structures to manipulate 
the internal state, e.g., PushGP [88]. Recently, Poli et al. [75] explored the use of soft 
assignment and soft return operations as forms of memory within linear and tree-based 



GP. For soft assignment, they replaced the traditional (entirely destructive) method of 
variable assignment with one of merging new values with previous ones, instead of 
overwriting them. To achieve this, the new value becomes a weighted average of the old 
register value with the new value to be assigned, i.e., V resll [ t = y\>new + (1 — y)v id where 
7 is a value in the range [0,1] specifying the assignment "hardness". For soft return 
operations, tree function nodes return a weighted average of their first argument with 
the result of the corresponding calculation, i.e., OUT = yF(IN\,IN%,...) + (1 - y)lN\ 
where IN„ is an input to a function, F. 

Here we explore and extend the hypothesis of inherent content-addressable memory 
existing within synchronous RBN due to different possible routes to a basin of attrac- 
tion [110] for the asynchronous case by maintaining the node states across each input- 
update-output cycle. A significant advantage of this approach is that each rule/network's 
short-term memory is variable-length and adaptive, i.e., the networks can adjust the 
memory parameters, selecting within the limits of the capacity of the memory, what 
aspects of the input sequence are available for computing predictions [68]. In addition, 
as we use open-ended evolution, the maximum size of the short-term memory is also 
open-ended, increasing as the number of nodes within the network grows. 

Here, nodes are initialised at random for the initial random placing in the maze 
but thereafter they are not reset for each subsequent matching cycle. Consequently, 
each network processes the environmental input and the final node states then become 
the starting point for the next processing cycle, whereupon the network receives the 
new environmental input and places the network on a trajectory toward a (potentially) 
different locally stable limit point. Therefore, a network given the same environmental 
input (i.e., the agent's current maze perception) but with different initial node states 
(representing the agent's history through the maze) may fall into a different basin of 
attraction (advocating a different action). Thus the rules' dynamics are (potentially) 
constantly affected by the inputs as the system executes. 

We now apply dDGP-XCSF to two well-known multi-step non-Markov maze envi- 
ronments that require memory to resolve perceptual aliasing: WoodslOl (see Figure 4a) 
and Woodsl02 (see Figure 4b). 

Each cell in the maze environments is encoded with two binary bits, where white 
space is represented as a '*', obstacles as 'O', and food as 'F'. Furthermore, actions are 
encoded in binary as shown in Figure 4c. The task is simply to find the shortest path to 
the food (F) given a random start point. Obstacles (O) represent cells which cannot be 
occupied. In Woods 1 the optimal number of steps to the food is 1.7, in Maze4 optimal 
is 3.5 steps, in WoodslOl it is 2.9, and in Woodsl02 it is 3.23. A teletransportation 
mechanism is employed whereby a trial is reset if the agent has not reached the goal 
state within 50 discrete movements. 

6.1 WoodslOl 

The WoodslOl maze (see Figure 4a) is a non-Markov environment containing two com- 
municating aliasing states, i.e., two positions which border on the same non-aliasing 
state and are identically sensed, but require different optimal actions. Thus, to solve this 
maze optimally, a form of memory must be utilised (with at least two internal states). 
Optimal performance has previously been achieved in WoodslOl through the addition 
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(a) Woods 101 Environment. 
Optimal number of steps is 
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(b) Woodsl02 Environment. Optimal 
number of steps is 3.23. 
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Fig. 4: Experimental Maze Environments and Encoding. 



of a memory register mechanism in LCS [57], by a Corporate LCS using rule-linkage 
[94], and by a neural LCS using recurrent links [ 14]. Furthermore, in a proof of concept 
experiment, the cyclical directed graph from neural programming has been shown ca- 
pable of representing rules with memory to solve Woods 101, however it was only found 
to do so twice in fifty experiments [5]. 

Figure 5 shows the performance of dDGP-XCSF in the Woods 101 environment 
with P = 2000, v = 5, 9 GA = 25, 8 del = 20, /? = 0.2, 77 = 0.2, xn = 1, p exp l = 10, 
and Nimt = 20 (16 inputs, 3 outputs, 1 match node). Here, optimality is observed after 
approximately 6,000 trials (Figure 5a). This is similar to the performance of LCS using 
a 1-bit memory register (^7,000 trials, P = 800) [57]. The number of macro-classifiers 
in the population converges to around 1800 (Figure 5b). Furthermore, the average num- 
ber of nodes in the networks increases by almost one and the number of connections 
declines fractionally (Figure 5c). The mutation rate (also Figure 5b) declines rapidly 
from approx 35% to its lowest point, 1.2%, around the six thousandth trial, which is 
at the same moment optimal performance is also observed. Lastly, Figure 5c conveys 
that the first thousand trials sees a rapid increase in the number of cycles, T, (30.6 to 
34.4) and a rapid decrease in the value of W (17 to 14.7). Subsequently, T continues to 
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increase, (although at a much slower rate) along with the average number of nodes in 
the networks; W remains stable at just fewer than 15. 

6.2 Woodsl02 

The Woods 102 maze (see Figure 4b) is a non-Markov environment containing aliasing 
conglomerates, i.e., adjacent aliasing states. The introduction of aliasing conglomerates 
increases the complexity of the learning task facing the agent significantly. "It would 
appear that three memory -register bits are required to resolve [the] perceptual aliasing. 
However, since the two situations occur in separate parts of the environment, there is 
the possibility that an optimal policy could evolve in which certain register bits are used 
in more than one situation, thus requiring fewer bits in all. It is therefore not clear how 
large a bit-register is strictly necessary" [57]. However, in practice, register redundancy 
was found to be important and an 8-bit memory register was required within LCS to 
solve the maze optimally, with 2 and 4-bit registers achieving only 4 and 3.7 steps 



respectively (ibid.). Figure 6 shows the performance of dDGP-XCSF in Woodsl02 with 
the same parameters used in the prior experiment, however, here p exp i =0.1 and P = 
20,000. Although a population size of 20,000 may seem disproportionate, a population 
of 2,000 classifiers was required for WoodslOl, representing a scale up of lOx, which 
can be compared with the increase required by LCS with a memory register (800 to 
6,000, or 7.5 x), where the potential number of internal actions required rises from 
3 1 = 3 to 3 8 = 6561 (ibid.), thus resources are clearly not increasing as quickly as the 
search space. 

Optimality is observed after approximately 80,000 trials (Figure 6a), this is slower 
than LCS with an explicit 8-bit memory register (—30,000 trials, P = 6000) [57]. How- 
ever here the size of the memory did not need to be predetermined as it is inherent 
within the networks, and the action selection policy remains constant, with constant GA 
activity, unlike in [57]. The number of macro-classifiers in the population converges to 
around 17,750 (Figure 6b). Furthermore, the average number of nodes in the networks 
increases fractionally to 20.6 and the number of connections declines on average from 
2.95 to 2.82 (Figure 6c). The mutation rate (Figure 6b) declines rapidly over the first 
40,000 trials from 32% to 5% and reaches its lowest point, 3.5%, at 100,000 trials. 
Lastly, from Figure 6c it can be seen that on average T increases from 30 to 35 and W 
from 17.5 to 20.5. 



7 Continuous Dynamical Systems 

Continuous network models of Genetic Regulatory Networks (GRN) are an extension 
of Boolean networks where nodes still represent genes, and the connections between 
them regulate the influence on gene expression. Differential equations wherein gene 
interactions are incorporated as logical functions are a typical approach [31,40]. 

There is a growing body of work exploring the evolution of different forms of such 
continuous-valued GRN. For example, Knabe et al. [42] devised a model that allows the 
grouping of inputs to a node and is formally closer to a higher order recurrent neural 
network. This was later used to model the evolution of cellular differentiation [43] and 
multicellular morphogenesis [44]. Another model is the Dynamic Recurrent Gene Net- 
work (DRGN) [27] which consist of a fully connected network of N nodes, each with a 
continuous activation state in the range [0,1], updated synchronously. Here a distinction 
is made between structural nodes (i.e., nodes that specify the current state but have no 
regulatory output) and regulatory nodes (i.e., nodes that only play a regulatory role). A 
single input node is used to specify the relative position of the cell in the lineage. To 
simulate the development of an organism, the node activations and the relative position 
input are initialised. Subsequently, cell division occurs through repeatedly duplicating 
the network, adjusting the relative positions in each network, and updating the states. 
The network weights are adapted through the use of an evolutionary algorithm. 

Furthermore, Dynamic Bayesian Networks (DBN) [29] combine Bayesian Net- 
works (BN) [72] with features of Hidden Markov Models [22], incorporating feedback 
cycles that simulate the temporal evolution of the network. DBN provide a stochastic 
model where both discrete and continuous states are possible. Heuristics are used to 
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learn the connectivity map and create additional hidden nodes. DBN have been shown 
to generalise many of the GRN models including RBN (see [69]). 

Fuzzy Cellular Automata (Fuzzy-CA) [102] are an extension of Boolean Cellular 
Automata (CA) and consists of an array of cells (lattice of nodes) where the cells exist in 
real-valued states in the range [0,1] and (typically) update their states with synchronous 
parallelism in discrete time. Traditionally, each cell calculates its next state depending 
upon its current state and the states of its closest neighbours. That is, Fuzzy-CA may be 
seen as a graph with a (typically) restricted topology. Since both transition and output 
functions are replaced by fuzzy relations, Fuzzy-CA include deterministic and non- 
deterministic finite automata as special cases [8 1 ] and were initially applied to pattern 
recognition and automatic control problems [102]. 

Following Cattaneo et al. [19], Reiter [80] investigated the affect of the fuzzy back- 
ground on the dynamics of cellular automata with various fuzzy logic sets. They found 
that the choice of logic used leads to significantly different behaviours. For example, ap- 



plying the various logical functions to create fuzzy versions of the Game of Life, it was 
noted that certain sets of logics generated Fuzzy-CA that tended toward homogeneous 
fuzzy behaviour, whereas others were consistent with chaotic or complex behaviour. 

8 Evolving Fuzzy Systems 

Fuzzy set theory [111] is a generalization of Boolean logic wherein continuous vari- 
ables can partially belong to sets. A fuzzy set is defined by a membership function, 
typically within the range [0,1], that determines the degree of belonging to a value of 
that set. Fuzzy set theory has been successfully applied to myriad engineering, medical, 
business, and natural science problems. 

Genetic Fuzzy Systems (GFS) [33] use GAs to optimise a fuzzy rule based system 
composed of "IF-THEN" rules, whose antecedents and consequents comprise fuzzy 
logic statements from fuzzy set theory. The first application of the GA-only, i.e., Pitts- 
burgh, approach to learning a fuzzy rule base was by Thrift [93]. 

Valenzuela-Rendon [97] provided the first use of the Michigan approach for re- 
inforcement learning with an evolving set of fuzzy rules. This was later extended to 
enable delayed-reward reinforcement learning [97,7], including continuous multi-step 
problems using continuous vector actions [79]. Fuzzy logic has been used in accuracy- 
based LCS for single-step reinforcement learning [18] and for data mining on several 
UCI data sets [61]. In addition, fuzzy logic has been used under a LCS supervised learn- 
ing scheme for data mining on UCI data sets [70] and for epidemiologic classification 
[100]. 

Aside from using LCS, alternative rule-like approaches have been applied such as 
[74] who used a GA to modify a fuzzy relational matrix of a one-input, one-output 
fuzzy model. 

By combining fuzzy logic with neural networks, neurons can deal with impreci- 
sion [58]. Bull and O'Hara [15] presented a form of fuzzy representation within LCS 
using Radial Basis Function neural networks (RBF) [67] to embody each condition- 
action rule. That is, a simple class of neural-fuzzy hybrid system. Furthermore, Su et 
al. [89] explored a similar representation based on RBF within LCS. However, here the 
contribution of each rule is determined by its strength (which is updated by a fuzzy 
bucket brigade algorithm) as well as the extent to which the antecedent matches the 
environment. Furthermore, in contrast to Bull and O'Hara [15], each condition-action 
rule corresponds to a hidden node instead of a fully-connected network and rules are 
added incrementally instead of being evolved through the GA. To date, only the use of 
RBF has been explored as a neuro-fuzzy hybrid representation within LCS. 

9 Fuzzy Logic Networks 

Fuzzy Logic Networks (FLN) [45,17] can be seen as both a generalization of Fuzzy- 
CA and RBN, where the Boolean functions from RBN are replaced with fuzzy logical 
functions from fuzzy set theory. Thus, FLN generalize RBN through a continuous rep- 
resentation and generalize Fuzzy-CA through a less restricted graph topology. Kok and 
Wang [45] explored 3 -gene regulation networks using FLN and found that not only 



were FLN able to represent the varying degrees of gene expression but also that the dy- 
namics of the networks were able to mimic a cell's irreversible changes into an invariant 
state or progress through a periodic cycle. 

FLN are defined as, given a set of N variables (genes), 

F(t) = (F l (t),F 2 (t),...,F N (t)),F l (t) e [0,l](i= 1,2,...,A0 (5) 

index t represents time; and the variables are updated by means of dynamic equations, 

F i (t + l)=A i (F il (t),F i2 (t),...,F iK (t) (6) 

where A, is a randomly chosen fuzzy logical function. The total number of choices for 
fuzzy logical functions is decided only by the number of inputs. If a node has K(l < 
K < N) inputs, then there are 2 K different fuzzy logical functions. In the definition of 
FLN, each node, f)(f ) has K inputs (see Figure 7). The membership function is defined 
as a function A u : U —> [0, 1] where A u is the degree of membership [17]. In all work 
thus far, all nodes are updated simultaneously, i.e., synchronously. 



Node 2 Encoding 



Connections (x,y) 




[1,3] 



1 |— » min(l,x+y) 



Current State 



0.283331 



Fig. 7: Example Fuzzy Logic Network and node encoding. Node 2 receives inputs from 
node 1 and 3 and performs a Fuzzy OR. 



A number of different fuzzy logic sets have been introduced since the original 
Max/Min method was proposed. Other commonly used fuzzy logics include CFMQVS, 
Probabilistic, MV, and gcd/lcm [80]. As previously mentioned, the choice of fuzzy set 
can result in significantly different behaviour. Therefore, in this paper a range of the 
most commonly used logics is potentially selectable (see Table 1), leaving evolution to 
identify the most appropriate combinations for a given problem. 

As previously mentioned, FLN are typically updated synchronously, however asyn- 
chronous schemes in CA, RBN, and Fuzzy-CA have been shown to provide a number 
of benefits, such as modeling the dynamics of GRN more realistically. Figure 8a shows 
the affect of K on a 13 node FLN updated asynchronously and Figure 8b when updated 
synchronously; results are an average of one hundred experiments for each value of K. 
In contrast to RBN where larger K results in an increased percentage of nodes changing 
state per update cycle [41], it can be seen that with FLN the greater the value of K, the 
less the number of states the networks will cycle through within an attractor. This is due 
to the tendency of the fuzzy logic functions to gravitate to extremes (i.e., or 1) with 



Table 1 : Selectable Fuzzy Logic Functions 



ID Function Logic 

Fuzzy OR (Max/Min) max(x,y) 

1 Fuzzy AND (CFMQVS and Probabilistic) x x y 

2 Fuzzy AND (Max/Min) min(x,y) 

3 Fuzzy OR (CFMQVS and MV) min(l,x + y) 

4 Fuzzy NOT 1-x 

5 Identity x 



increased inputs. After an initial rapid decline in the rate of change, the networks begin 
to stabilise as the states fall into their respective attractors. However, similarly to RBN, 
it can be seen that an asynchronous updating scheme results in a lower percent of nodes 
changing state when compared to the synchronous case. In the asynchronous case, when 
K = 2, the number of nodes changing state converges to around 10% compared with 
30% of synchronous nodes, and when K = 5 to approximately 2.5% compared with 7% 
of nodes in the synchronous case. 
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Fig. 8: The affect of K on a 13 node Fuzzy Logic Network. 



10 Fuzzy DGP-XCSF 

Whereas FLN have been used previously to model aspects of GRN, no prior studies 
have explored the evolution of the networks for computation. Furthermore, all prior 
studies have only considered a synchronous updating scheme. To use asynchronous 
FLN as the rules within XCSF (hereinafter, "fDGP-XCSF"), the following scheme is 
adopted. Each of an initial randomly created rule's nodes has K randomly assigned 
connections, here < K < 5, where a node with K = thus retains a constant node 
state. There are initially as many nodes /V as input fields / for the given task and its 



outputs O, plus one other, for matching, i.e., N = 1 + + 1. The first connection of each 
input node is set to the corresponding locus of the input message. The other connections 
are assigned at random within the FLN. Node states are initialised at random for the first 
step of a trial but thereafter they are not reset for each subsequent matching cycle. The 
population is initially empty and covering is applied to generate rules as in the standard 
XCSF approach. 

If a given FLN has a (real) value of fewer than 0.5 on the match node, regardless of 
the state of its outputs, the rule does not join [M] (see Figure 7). This scheme has also 
been exploited within neural LCS [11]. The output nodes are discretised in a similar 
fashion where a state fewer than 0.5 translates to a binary 0, otherwise 1 . Furthermore, 
a windowed approach is utilised whereby the final state of each node is calculated as an 
average over the last W cycles to T. 

A node's function is represented by an integer which references the appropriate op- 
eration to execute upon its received inputs (see Table 1 for the fuzzy functions used). 
Further, each node's connectivity is represented as a list of MAX_K integers (here 
MAXJC = 5) in the range [0,N], where represents no input to be received on that 
connection. Each integer in the list is subjected to mutation on reproduction at the self- 
adapting rate jj. for that rule. Hence, within the representation, evolution can select 
different fuzzy logic functions for each node within a given network rule, along with its 
connectivity map. 

11 Fuzzy DGP-XCSF Experimentation 
11.1 2-D Continuous Gridworld Environment 

The 2-D Continuous Gridworld environment [8] is a two dimensional environment 
wherein the current state is a real valued coordinate (x,y) G [0, l] 2 . The agent is ini- 
tially randomly placed within the grid and attempts to find the shortest path to the goal, 
located in the upper right corner; more specifically, in this paper the goal is found when 
x + y > 1.9, at which point the agent is given a fixed reward of 1000, otherwise is 
given. Any action that would take the system outside of the environment moves the 
system to the nearest boundary. A teletransportation mechanism is employed whereby 
a trial is reset if the agent has not reached the goal state within 500 movements. As 
actions, the agent may choose one of four possible movements (north, south, east, or 
west) each of which is a step size, s, of 0.05. The optimal number of steps is thus 18.6. 
The continuous state space, combined with the long sequence of actions required to 
reach the goal, make the Continuous Gridworld one of the most challenging multistep 
problems hitherto considered by LCS [54]. 

Figure 9 shows the performance of fDGP-XCSF in the Continuous Gridworld envi- 
ronment using the same parameters used by [54]. However, here P = 20,000, Nmu = 5 
(2 inputs, 2 outputs, 1 match node). From Figure 9a it can be seen that an optimal solu- 
tion is learnt around 30,000 trials, which is slower than XCSF with interval-conditions 
(~ 15,000 trials, P = 10,000) [54], however is similar in performance to an MLP-based 
neural-XCSF [36]. The average mutation rate within the networks (see Figure 9b) de- 
clines rapidly from 40% to 5% after 10,000 trials and then declines at a slower rate until 



reaching a bottom around 2.5% after 50,000 trials. The number of (non-unique) macro- 
classifiers (also Figure 9b) initially grows rapidly, reaching a peak at 10,000 before 
declining to around 6,900. Furthermore, from Figure 9c it can be seen that the average 
number of nodes in the fuzzy logic networks increases from 5 to 7.1 and the average 
number of connections within the networks remains near static around 2. Additionally, 
the average value of W remains static around 10, while the value of T increases slightly, 
on average, from 26 to 27. 
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Fig. 9: fDGP-XCSF Continuous Grid(0.05) performance. 



11.2 Continuous-action Frog Problem 



The Frog Problem [107,108] is a single-step problem with a non-linear continuous- 
valued payoff function in a continuous one-dimensional space. A frog is given the 
learning task of jumping to catch a fly that is at a distance, d, from the frog, where 
< d < 1. The frog receives a sensory input, x(d) = 1 — d, before jumping a chosen 
distance, a, and receiving a reward based on its new distance from the fly, as given by: 



In the continuous-action case, the frog may select any continuous number in the 
range [0,1] and thus the optimal achievable performance is 100%. 

Wilson [108] presented a form of XCSF where the action was computed directly as 
a linear combination of the input state and a vector of action weights, and conducted 
experimentation on the continuous-action Frog problem, selecting the classifier with 
the highest prediction for exploitation. Tran et al. [95] subsequently extended this by 
adapting the action weights to the problem through the use of an Evolution Strategy 
(ES). In addition to the action weights, a vector of standard deviations is maintained 
for use as the mutation step size by the ES. During exploration, the ES is applied to 
each member of [A] to evolve the action weights and standard deviations, where each 
rule functions as a single parent producing an offspring via mutation; the offspring is 
then evaluated on the current environment state and its fitness updated and compared 
with the parent, if the offspring has a higher fitness it replaces the parent, otherwise it is 
discarded. Moreover, the exploration action selection policy was modified from purely 
random to selecting the action with the highest prediction. After reinforcement updates 
and running the ES, the GA is invoked using a combination of mixed crossover and mu- 
tation. They reported greater than 99% performance after an averaged number of 30,000 
trials (P = 2000), which was superior to the performance reported by [108]. More re- 
cently, Ramirez-Ruiz et al. [79] applied a Fuzzy-LCS with continuous vector actions, 
where the GA only evolved the action parts of the fuzzy systems, to the continuous- 
action Frog problem, and achieved a lower error than Q-learning (discretized over 100 
elements in x and a) after 500,000 trials (P = 200). 

To accommodate continuous-actions, the following modifications were made to 
fDGP-XCSF Firstly, the output nodes are no longer discretized, instead providing a 
real numbered output in the range [0,1]. After building [M] in the standard way, [A] 
is built by selecting a single classifier from [M] and adding matching classifiers whose 
actions are within a predetermined range of that rule's proposed action (here the range, 
or window size, is set to ±0.005). Parameters are then updated and the GA executed 
as usual in [A]. Exploitation functions by selecting the single 'best' rule from [M]; the 
following experiments compare the performance achieved using various criteria to se- 
lect the best rule from the match set. The parameters used here are the same as used 
by [107,108] and [95], i.e., P = 2000, v = 5, 9 GA = 48, 9 del = 50, e = 0.01, J3 = 0.2, 
rj = 0.2, xo = 1. Only one output node is required and thus Nmit = 3. 

Figure 10 illustrates the performance of fDGP-XCSF in the continuous-action Frog 
Problem. From Figure 10a it can be seen that greater than 99% performance is achieved 
in fewer than 4,000 trials (P = 2000), which is faster than previously reported results 




(>99% after 30,000 trials, P = 2000 [95]) (>95% after 10,000 trials, P = 2000 [108]), 
and with minimal changes resulting in none of the drawbacks; i.e., exploration is here 
conducted with roulette wheel on prediction instead of deterministically selecting the 
highest predicting rule, an approach more suitable for online learning. Furthermore, in 
[95] the action weights update component includes the evaluation of the offspring on 
the last input/payoff before being discarded if the mutant offspring is not more accurate 
than the parent; therefore additional evaluations are performed which are not reflected 
in the number of trials reported. 

From Figure 10b it can be seen that the average number of (non-unique) macro- 
classifiers rapidly increases to approximately 1400 after 3,000 trials, before converging 
to around 150; this is more compact than XCSF with interval conditions (^1400) [108], 
showing that fDGP-XCSF can provide strong generalisation. In addition, the networks 
grow, on average, from 3 nodes to 3.5, and the average connectivity remains static 
around 1.9. The average mutation rate declines from 50% to 2% over the first 15,000 
trials before converging to around 1.2% and the average value of T increases by from 
28.5 to 31.5. 
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Fig. 10: fDGP-XCSF Continuous-action Frog Problem performance. 



12 Conclusions 



This paper has explored examples of a temporally dynamic graph-based representation 
updated with asynchronous parallelism (DGP). The DGP syntax presented consists of 
each node receiving an arbitrary number of inputs from an unrestricted topology (i.e., 
recursive connections are permitted), and then performing an arbitrary function. The 
representation is evolved under a self-adaptive and open-ended scheme, allowing the 
topology to grow to any size to meet the demands of the problem space. 



In the discrete case, DGP is equivalent to a form of Random Boolean Network 
(RBN). It was shown that the XCSF Learning Classifier System is able to design ensem- 
bles of asynchronous RBN whose emergent behaviour can collectively solve discrete- 
valued computational tasks under a reinforcement learning scheme. In particular, it was 
shown possible to evolve and retrieve the content-addressable memory existing as lo- 
cally stable limit points (attractors) within the asynchronously (randomly) updated net- 
works when the final node states from the previous match processing cycle become 
the starting states for the next environmental input. Furthermore, it was shown that the 
parameters controlling system sampling of the networks' dynamical behaviour can be 
made to self-adapt to the temporal complexities of the target environment. The intro- 
duced system thus does not need prior knowledge of the dynamics of the solution net- 
works necessary to represent the environment. In particular, the representation scheme 
was exploited to solve the Woodsl02 non-Markov maze (i.e., without extra mecha- 
nisms), a maze which has only previously been solved by LCS using an explicit 8-bit 
memory register. 

A significant advantage of the memory inherent within DGP is that each rule / net- 
work's short-term memory is variable-length and adaptive, i.e., the networks can adjust 
the memory parameters, selecting within the limits of the capacity of the memory, what 
aspects of the input sequence are available for computing predictions. In addition, as the 
topology is variable-length, the maximum size of the short-term memory is open-ended, 
increasing as the number of nodes within the network grows. Thus the maximum size 
of the content-addressable memory does not need to be predetermined. 

Subsequently, the generality of the DGP scheme was further explored by replacing 
the selectable Boolean functions with fuzzy logical functions, permitting the applica- 
tion to continuous-valued domains. Specifically, the collective emergent behaviour of 
ensembles of asynchronous Fuzzy Logic Networks were shown to be exploitable in 
solving continuous-valued input-output reinforcement learning problems, with similar 
performance to MLP-based neural-XCSF in the continuous-valued multi-step Grid en- 
vironment and superior performance to those reported previously in the Frog Problem. 

Current research is exploring the possibilities of DGP as a general representation 
scheme by which to solve complex problems with LCS. 
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